Accurate traffic flow prediction, a hotspot for intelligent transportation research, is the prerequisite for mastering traffic and making travel plans. The speed of traffic flow can be affected by roads condition, weather, holidays, etc. Furthermore, the sensors to catch the information about traffic flow will be interfered with by environmental factors such as illumination, collection time, occlusion, etc. Therefore, the traffic flow in the practical transportation system is complicated, uncertain, and challenging to predict accurately. This paper proposes a deep encoder-decoder prediction framework based on variational Bayesian inference. A Bayesian neural network is constructed by combining variational inference with gated recurrent units (GRU) and used as the deep neural network unit of the encoder-decoder framework to mine the intrinsic dynamics of traffic flow. Then, the variational inference is introduced into the multi-head attention mechanism to avoid noise-induced deterioration of prediction accuracy. The proposed model achieves superior prediction performance on the Guangzhou urban traffic flow dataset over the benchmarks, particularly when the long-term prediction.
translated by 谷歌翻译
Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.
translated by 谷歌翻译
Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives. We treat schedule primitives as tensor languages. TLP is thus a Tensor Language Processing task. In this way, the task of predicting the tensor program latency through the cost model is transformed into a natural language processing (NLP) regression task. MTL-TLP combines Multi-Task Learning and TLP to cope with the cross-hardware unavailability problem. We incorporate these techniques into the Ansor framework and conduct detailed experiments. Results show that TLP can speed up the average search time by 9.1X and 3.0X on CPU and GPU workloads, respectively, compared to the state-of-the-art implementation. MTL-TLP can achieve a speed-up of 4.7X and 2.9X on CPU and GPU workloads, respectively, using only 7% of the target hardware data.
translated by 谷歌翻译
Left-ventricular ejection fraction (LVEF) is an important indicator of heart failure. Existing methods for LVEF estimation from video require large amounts of annotated data to achieve high performance, e.g. using 10,030 labeled echocardiogram videos to achieve mean absolute error (MAE) of 4.10. Labeling these videos is time-consuming however and limits potential downstream applications to other heart diseases. This paper presents the first semi-supervised approach for LVEF prediction. Unlike general video prediction tasks, LVEF prediction is specifically related to changes in the left ventricle (LV) in echocardiogram videos. By incorporating knowledge learned from predicting LV segmentations into LVEF regression, we can provide additional context to the model for better predictions. To this end, we propose a novel Cyclical Self-Supervision (CSS) method for learning video-based LV segmentation, which is motivated by the observation that the heartbeat is a cyclical process with temporal repetition. Prediction masks from our segmentation model can then be used as additional input for LVEF regression to provide spatial context for the LV region. We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs. Results show our method outperforms alternative semi-supervised methods and can achieve MAE of 4.17, which is competitive with state-of-the-art supervised performance, using half the number of labels. Validation on an external dataset also shows improved generalization ability from using our method. Our code is available at https://github.com/xmed-lab/CSS-SemiVideo.
translated by 谷歌翻译
完全有监督的语义细分从密集的口罩中学习,这需要封闭设置的大量注释成本。在本文中,我们使用自然语言作为监督,而无需任何像素级注释进行开放世界细分。我们将提出的框架称为FreeSeg,在该框架上可以从训练训练型模型的原始功能图中免费获得。与零射击或开放集分割相比,freeSeg不需要任何带注释的掩码,并且可以广泛预测超出类无需监督的分段之外的类别。具体而言,FreeSeg从图像文本相似性图(ITSM)中获得了可解释的对比度图像预处理(ICLIP)的自由掩码。我们的核心改进是浓密ICLIP的平滑最小池,具有部分标签和像素的分割策略。此外,没有复杂的设计,例如分组,聚类或检索,很简单。除了简单性外,Freeseg的表现超过了以前的最先进的边缘,例如在同一设置中,MIOU在MIOU上的13.4%。
translated by 谷歌翻译
对比性语言图像预训练(剪辑)通过随时可用的自然语言监督学习丰富的表示。它可以改善下游视觉任务的一般性能,包括但不限于零射击,长尾巴,细分,检索,标题和视频。但是,据我们所知,尚未研究剪辑的视觉解释性。为了提供其预测的视觉解释,我们提出了图像文本相似性图(ITSM)。基于它,我们出人意料地发现,剪辑比前景更喜欢背景区域,并且对人类理解提出了错误的可视化。在实验上,我们发现魔鬼在汇总部分,其中不适当的合并方法导致一种称为语义转移的现象。为了纠正和提高可视化结果,我们提出了蒙版的最大池,并使用自我监督图像编码器的注意力图。同时,解释性任务和识别任务需要不同的表示。为了解决这个问题,我们提出了双重预测,以满足这一要求。我们将上述方法整合为可解释的对比度图像预训练(ICLIP)。实验表明ICLIP极大地提高了可解释性。例如,在VOC 2012数据集中,非平凡的改进分别为$ 32.85 \%$和$ 49.10 \%$。
translated by 谷歌翻译
尽管图像增强的最新进展,但现有方法仍然很难适应弱光和正常光图像的亮度和对比度。为了解决这个问题,我们提出了一种新型的2D直方直方图均衡方法。它假设强度发生和同时存在相互依赖,并通过在强度共发生的分布(2D直方图)上进行边缘化,从而导致强度发生的分布(1D直方图)。该方案更有效地改善了全局对比度,并减少了噪声扩增。2D直方图是通过将图像反射率中的局部像素值差异纳入密度估计中以减轻暗照明条件的不利影响的定义。超过500张图像用于评估,证明了我们的方法优于现有研究。它可以充分提高低光图像的亮度,同时避免正常光明图像中过度增强。
translated by 谷歌翻译
现有的图像增强方法无法达到预期,因为由于它们很难同时改善全球和本地图像对比度。为了解决这个问题,我们提出了一种基于直方图均衡的方法,该方法适应了亮度增强的数据依赖性要求,并提高了细节的可见性,而不会失去全局对比度。该方法将图像上下文提供的空间信息包含在密度估计中,以进行判别直方图均衡。为了最大程度地减少非均匀照明的不利影响,我们建议根据用边缘保留平滑估计的图像反射率来定义空间信息。我们的方法特别适合确定应如何调整背景亮度,并揭示隐藏在黑暗中的有用图像细节。
translated by 谷歌翻译
视频阴影检测旨在在视频帧之间产生一致的阴影预测。但是,当前的方法遇到了整个框架的阴影预测不一致的,尤其是当视频中的照明和背景纹理发生变化时。我们观察到不一致的预测是由阴影特征不一致引起的,即,同一阴影区域的特征在附近的框架之间显示出不同的礼节。在本文中,我们提出了一种新颖的阴影通信方法(SC-COR)(SC-COR) ),以增强跨帧的特定阴影区域的像素相似性,以进行视频阴影检测。我们提出的SC-COR具有三个主要优势。首先,不需要密集的像素到像素对应标签,SC-COR可以以弱监督的方式学习跨帧的像素对应。其次,SC-COR考虑了阴影内的可分离性,这对视频中的变体纹理和照明是可靠的。最后,SC-COR是一个插件模块,可以轻松地集成到没有额外的计算成本的情况下。我们进一步设计了一个新的评估指标,以评估视频阴影检测结果的时间稳定性。实验结果表明,SC-COR的表现优于先前的最新方法,而IOU的表现为6.51%,而新引入的时间稳定性度量为3.35%。
translated by 谷歌翻译
焦点损失已获得了令人难以置信的知名度,因为它使用一种简单的技术来识别和利用硬性示例来在分类方面取得更好的性能。但是,此方法不容易在分类任务之外概括,例如在KePoint检测中。在本文中,我们提出了对焦点检测任务的焦点损失的新颖适应,称为对抗局灶性损失(AFL)。AFL不仅在语义上类似于焦点损失,而且还可以作为任意损失功能的插头升级。尽管焦点损失需要分类器的输出,但AFL利用单独的对抗网络来为每个输入产生难度分数。然后,即使在没有分类器的情况下,也可以将这种难度分数用于在硬示例上的学习优先级。在这项工作中,我们展示了AFL在增强关键点检测中现有方法的有效性,并验证其根据难度重新提交示例的能力。
translated by 谷歌翻译